[Macchiato] EDKII grub boot fails with PCIe init

Marcin Wojtas mw at semihalf.com
Wed Mar 14 08:06:02 GMT 2018


Hi Frederik,

Most likely driver pointed by Ard requires configuration which is not
yet present in the branch. In order to reduce points of freedom,
please try branch with the latest unmodified commit, as I asked
before. Please let know, how it goes and provide with the full
bootlog.

Thanks,
Marcin

2018-03-14 8:03 GMT+01:00 Frederik Lotter <frederik.lotter at netronome.com>:
> On Tue, Mar 13, 2018 at 5:39 PM, Ard Biesheuvel <ard.biesheuvel at linaro.org>
> wrote:
>>
>> On 13 March 2018 at 15:31, Frederik Lotter
>> <frederik.lotter at netronome.com> wrote:
>> > On Tue, Mar 13, 2018 at 3:45 PM, Ard Biesheuvel
>> > <ard.biesheuvel at linaro.org>
>> > wrote:
>> >>
>> >> On 13 March 2018 at 13:44, Frederik Lotter
>> >> <frederik.lotter at netronome.com> wrote:
>> >> > On Tue, Mar 13, 2018 at 2:55 PM, Ard Biesheuvel
>> >> > <ard.biesheuvel at linaro.org>
>> >> > wrote:
>> >> >>
>> >> >> On 13 March 2018 at 12:48, Frederik Lotter
>> >> >> <frederik.lotter at netronome.com> wrote:
>> >> >> > On Tue, Mar 13, 2018 at 2:36 PM, Ard Biesheuvel
>> >> >> > <ard.biesheuvel at linaro.org>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> On 13 March 2018 at 12:26, Frederik Lotter
>> >> >> >> <frederik.lotter at netronome.com> wrote:
>> >> >> >> > Hi Ard,
>> >> >> >> >
>> >> >> >> > On Mon, Mar 12, 2018 at 6:22 PM, Ard Biesheuvel
>> >> >> >> > <ard.biesheuvel at linaro.org>
>> >> >> >> > wrote:
>> >> >> >> >>
>> >> >> >> >> On 12 March 2018 at 16:15, Frederik Lotter
>> >> >> >> >> <frederik.lotter at netronome.com> wrote:
>> >> >> >> >> > Hi,
>> >> >> >> >> > I am getting CPU stall warnings when booting up using the
>> >> >> >> >> > EFI
>> >> >> >> >> > route.
>> >> >> >> >> > I
>> >> >> >> >> > suspect the PCIe interface, as the stall warning sometimes
>> >> >> >> >> > contain
>> >> >> >> >> > the
>> >> >> >> >> > probe
>> >> >> >> >> > function. Other times is seems to get further than PCIe
>> >> >> >> >> > init,
>> >> >> >> >> > but
>> >> >> >> >> > still
>> >> >> >> >> > stall interrupt handling.
>> >> >> >> >> > Here are some facts around my observation:
>> >> >> >> >> >
>> >> >> >> >> > I have two sdcards for my Machiattobin board. They have
>> >> >> >> >> > identical
>> >> >> >> >> > kernels
>> >> >> >> >> > (4.16 rc5) with Ubuntu 16.04 rootfs. The one sdcard uses a
>> >> >> >> >> > uboot,
>> >> >> >> >> > DT
>> >> >> >> >> > and
>> >> >> >> >> > kernel boot. The second sdcard has EDKII, grub kernel boot.
>> >> >> >> >> > The
>> >> >> >> >> > EDKII
>> >> >> >> >> > build
>> >> >> >> >> > includes the device tree DTB (and DTS which I believe is
>> >> >> >> >> > unused)
>> >> >> >> >> > from
>> >> >> >> >> > the
>> >> >> >> >> > one used on the uboot sdcard.
>> >> >> >> >> >
>> >> >> >> >> > EFI stub: Booting Linux Kernel...
>> >> >> >> >> > EFI stub: Using DTB from configuration table
>> >> >> >> >> > EFI stub: Exiting boot services and installing virtual
>> >> >> >> >> > address
>> >> >> >> >> > map...
>> >> >> >> >> > [    0.000000] Booting Linux on physical CPU 0x0000000000
>> >> >> >> >> > [0x410fd081]
>> >> >> >> >> > [    0.000000] Linux version
>> >> >> >> >> > 4.16.0-rc5-mbcin-netronome-2-dirty
>> >> >> >> >> > (root at mcb1-cpt) (gcc version 5.4.0 20160609 (Ubuntu/Linaro
>> >> >> >> >> > 5.4.0-6ubuntu1~16.04.9)) #2 SMP PREEMPT Mon Mar 12 14:40:25
>> >> >> >> >> > UTC
>> >> >> >> >> > 2018
>> >> >> >> >> > [    0.000000] Machine model: Marvell 8040 MACHIATOBin
>> >> >> >> >> > [    0.000000] efi: Getting EFI parameters from FDT:
>> >> >> >> >> > [    0.000000] efi: EFI v2.70 by EDK II
>> >> >> >> >> > [    0.000000] efi:  SMBIOS 3.0=0xbfd00000  ACPI
>> >> >> >> >> > 2.0=0xb6760000
>> >> >> >> >> > MEMATTR=0xb8973418  RNG=0xbffdbf98
>> >> >> >> >> > [    0.000000] random: fast init done
>> >> >> >> >> > [    0.000000] efi: seeding entropy pool
>> >> >> >> >> > :
>> >> >> >> >> >
>> >> >> >> >> > (I am using the latest EDKII master, the Marvell
>> >> >> >> >> > edk2-open-platform
>> >> >> >> >> > 17.10
>> >> >> >> >> > banch, with all the latest mv-ddr/ atf /etc....).
>> >> >> >> >> >
>> >> >> >> >> > The DT data appear there in die EFI boot, but the PCIe
>> >> >> >> >> > interface
>> >> >> >> >> > fails,
>> >> >> >> >> > and
>> >> >> >> >> > results (I believe) in the CPU stall warnings:
>> >> >> >> >> >
>> >> >> >> >> > [  717.453025] INFO: rcu_preempt self-detected stall on CPU
>> >> >> >> >> > :
>> >> >> >> >> > :
>> >> >> >> >> > [  717.589783]  armada8k_pcie_probe+0x140/0x240
>> >> >> >> >> > :
>> >> >> >> >> >
>> >> >> >> >> > Other times, the pcie gets further:
>> >> >> >> >> >
>> >> >> >> >> > [    3.312127] PCI: OF: host bridge /cp0/pcie at f2600000
>> >> >> >> >> > ranges:
>> >> >> >> >> > [    3.317740] PCI: OF:    IO 0xf9000000..0xf900ffff ->
>> >> >> >> >> > 0xf9000000
>> >> >> >> >> > [    3.323692] PCI: OF:   MEM 0xc0000000..0xdfffffff ->
>> >> >> >> >> > 0xc0000000
>> >> >> >> >> > [    3.328915] random: crng init done
>> >> >> >> >> > [    4.326158] armada8k-pcie f2600000.pcie: phy link never
>> >> >> >> >> > came
>> >> >> >> >> > up
>> >> >> >> >> > [    4.332109] armada8k-pcie f2600000.pcie: Link not up
>> >> >> >> >> > after
>> >> >> >> >> > reconfiguration
>> >> >> >> >> > [    4.339056] armada8k-pcie f2600000.pcie: PCI host bridge
>> >> >> >> >> > to
>> >> >> >> >> > bus
>> >> >> >> >> > 0000:00
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> To be brutally honest, the armada8k-pcie driver is a piece of
>> >> >> >> >> junk,
>> >> >> >> >> and you're much better off using the generic ECAM driver,
>> >> >> >> >> which
>> >> >> >> >> now
>> >> >> >> >> includes special handling for the missing root port on
>> >> >> >> >> Synopsys
>> >> >> >> >> IP.
>> >> >> >> >>
>> >> >> >> >> It also allows you to have both MMIO32 and MMIO64 regions,
>> >> >> >> >> which
>> >> >> >> >> can
>> >> >> >> >> be useful with some PCIe cards with large BARs
>> >> >> >> >>
>> >> >> >> >> Could you try
>> >> >> >> >>
>> >> >> >> >> compatible = "marvell,armada8k-pcie-ecam";
>> >> >> >> >>
>> >> >> >> >> in the DT node, please?
>> >> >> >> >>
>> >> >> >> >> (Before you do that, please check whether UEFI recognizes your
>> >> >> >> >> PCI
>> >> >> >> >> hardware using the 'pci' command in the shell)
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > This exercise help a lot. Thank you for the proposal.
>> >> >> >> >
>> >> >> >> > So now I can consistently boot using uboot and efi.
>> >> >> >> >
>> >> >> >> > However, the pcie driver init fails. I have provided boot logs
>> >> >> >> > and
>> >> >> >> > also
>> >> >> >> > my
>> >> >> >> > DT entry - we need custom BAR ranges, and I am not sure if this
>> >> >> >> > driver
>> >> >> >> > understand everything.
>> >> >> >> >
>> >> >> >> >  cp0_pcie0: pcie at f2600000 {
>> >> >> >> >   compatible = "marvell,armada8k-pcie-ecam", "snps,dw-pcie";
>> >> >> >> >   reg = <0 0xf2600000 0 0x10000>,
>> >> >> >> >         <0 ((0xf6000000 + (0 * 0x1000000)) + 0xf00000) 0
>> >> >> >> > 0x80000>;
>> >> >> >> >   reg-names = "ctrl", "config";
>> >> >> >> >   #address-cells = <3>;
>> >> >> >> >   #size-cells = <2>;
>> >> >> >> >   #interrupt-cells = <1>;
>> >> >> >> >   device_type = "pci";
>> >> >> >> >   dma-coherent;
>> >> >> >> >   msi-parent = <&gic_v2m0>;
>> >> >> >> >
>> >> >> >> >   bus-range = <0 0xff>;
>> >> >> >> >   ranges =
>> >> >> >> >
>> >> >> >> >   <0x81000000 0 (0xf9000000 + (0 * 0x10000)) 0 (0xf9000000 + (0
>> >> >> >> > *
>> >> >> >> > 0x10000))
>> >> >> >> > 0 0x10000
>> >> >> >> >
>> >> >> >> >   0x82000000 0 (0xf6000000 + (0 * 0x1000000)) 0 (0xf6000000 +
>> >> >> >> > (0 *
>> >> >> >> > 0x1000000)) 0 0xf00000>;
>> >> >> >> >   interrupt-map-mask = <0 0 0 0>;
>> >> >> >> >   interrupt-map = <0 0 0 0 &cp0_icu 0x0 22 4>;
>> >> >> >> >   interrupts = <0x0 22 4>;
>> >> >> >> >   num-lanes = <1>;
>> >> >> >> >   clocks = <&cp0_clk 1 13>;
>> >> >> >> >   status = "disabled";
>> >> >> >> >  };
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > Error:
>> >> >> >> >
>> >> >> >> > [    1.396968] PCI: OF: host bridge /cp0/pcie at f2600000 ranges:
>> >> >> >> > [    1.396979] PCI: OF:    IO 0xf9000000..0xf900ffff ->
>> >> >> >> > 0xf9000000
>> >> >> >> > [    1.396984] PCI: OF:   MEM 0xc0000000..0xdfffffff ->
>> >> >> >> > 0xc0000000
>> >> >> >> > [    1.396998] pci-host-generic f2600000.pcie: ECAM area [mem
>> >> >> >> > 0xf2600000-0xf260ffff] can only accommodate [bus
>> >> >> >> > 00-ffffffffffffffff]
>> >> >> >> > (reduced from [bus 00-ff] desired)
>> >> >> >> > [    1.397002] pci-host-generic f2600000.pcie: ECAM ioremap
>> >> >> >> > failed
>> >> >> >> > [    1.397011] pci-host-generic: probe of f2600000.pcie failed
>> >> >> >> > with
>> >> >> >> > error
>> >> >> >> > -12
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > Thanks for the support.
>> >> >> >> >
>> >> >> >>
>> >> >> >> Please try the following config
>> >> >> >>
>> >> >> >> cp0_pcie0: pcie at e0000000 {
>> >> >> >>    compatible = "marvell,armada8k-pcie-ecam", "snps,dw-pcie";
>> >> >> >>    reg = <0 0xe0000000 0 0xff00000>;
>> >> >> >>    #address-cells = <3>;
>> >> >> >>    #size-cells = <2>;
>> >> >> >>    #interrupt-cells = <1>;
>> >> >> >>    device_type = "pci";
>> >> >> >>    dma-coherent;
>> >> >> >>    msi-parent = <&gic_v2m0>;
>> >> >> >>
>> >> >> >>    bus-range = <0 0xfe>;
>> >> >> >>    ranges = <0x1000000 0x0 0x00000000 0x0 0xeff00000 0x0
>> >> >> >> 0x00010000>,
>> >> >> >>             <0x2000000 0x0 0xc0000000 0x0 0xc0000000 0x0
>> >> >> >> 0x20000000>,
>> >> >> >>             <0x3000000 0x8 0x00000000 0x8 0x00000000 0x1
>> >> >> >> 0x00000000>;
>> >> >> >>
>> >> >> >>    interrupt-map-mask = <0 0 0 0>;
>> >> >> >>    interrupt-map = <0 0 0 0 &cp0_icu 0x0 22 4>;
>> >> >> >> };
>> >> >> >
>> >> >> >
>> >> >> > I am trying it now.
>> >> >> >
>> >> >> > Could you just give me some insight on how the peripheral base
>> >> >> > address
>> >> >> > can
>> >> >> > be just modified like that ?
>> >> >> >
>> >> >> > Is there a mapping change somewhere?
>> >> >> >
>> >> >>
>> >> >> All those addresses are configurable, and the default armada8k-pcie
>> >> >> driver sets up all the translation windows from scratch (in a rather
>> >> >> limited way, mind you)
>> >> >>
>> >> >> The armada8k-pcie-ecam driver just reuses the configuration set by
>> >> >> the
>> >> >> firmware, allowing for a larger bus range and an additional 4 GB
>> >> >> window for 64-bit MMIO
>> >> >
>> >> >
>> >> > The new DTS extract:
>> >> >
>> >> > cp0_pcie0: pcie at 0xe0000000 {
>> >> >   compatible = "marvell,armada8k-pcie-ecam", "snps,dw-pcie";
>> >> >   reg = <0 0xe0000000 0 0x10000>;
>> >> >
>> >> >   #address-cells = <3>;
>> >> >   #size-cells = <2>;
>> >> >   #interrupt-cells = <1>;
>> >> >   device_type = "pci";
>> >> >   dma-coherent;
>> >> >   msi-parent = <&gic_v2m0>;
>> >> >
>> >> >   bus-range = <0 0xfe>;
>> >> >   ranges = <0x1000000 0x0 0x00000000 0x0 0xeff00000 0x0 0x00010000>,
>> >> >    <0x2000000 0x0 0xc0000000 0x0 0xc0000000 0x0 0x20000000>,
>> >> >    <0x3000000 0x8 0x00000000 0x8 0x00000000 0x1 0x00000000>;
>> >> >
>> >> >   interrupt-map-mask = <0 0 0 0>;
>> >> >   interrupt-map = <0 0 0 0 &cp0_icu 0x0 22 4>;
>> >> >  };
>> >> >
>> >> > The result:
>> >> >
>> >> > [    1.463594] PCI: OF: host bridge /cp0/pcie at 0xe0000000 ranges:
>> >> > [    1.463608] PCI: OF:    IO 0xeff00000..0xeff0ffff -> 0x00000000
>> >> > [    1.463616] PCI: OF:   MEM 0xc0000000..0xdfffffff -> 0xc0000000
>> >> > [    1.463622] PCI: OF:   MEM 0x800000000..0x8ffffffff -> 0x800000000
>> >> > [    1.463638] pci-host-generic e0000000.pcie: ECAM area [mem
>> >> > 0xe0000000-0xe000ffff] can only accommodate [bus 00-ffffffffffffffff]
>> >> > (reduced from [bus 00-fe] desired)
>> >> > [    1.463646] pci-host-generic e0000000.pcie: ECAM ioremap failed
>> >> > [    1.463657] pci-host-generic: probe of e0000000.pcie failed with
>> >> > error
>> >> > -12
>> >> >
>> >> >
>> >>
>> >> Please use the size I suggested for the 'reg' property
>> >
>> >
>> > Sorry I missed that:
>> >
>> > [    1.463413] PCI: OF: host bridge /cp0/pcie at 0xe0000000 ranges:
>> > [    1.463427] PCI: OF:    IO 0xeff00000..0xeff0ffff -> 0x00000000
>> > [    1.463435] PCI: OF:   MEM 0xc0000000..0xdfffffff -> 0xc0000000
>> > [    1.463442] PCI: OF:   MEM 0x800000000..0x8ffffffff -> 0x800000000
>> > [    1.463481] pci-host-generic e0000000.pcie: ECAM at [mem
>> > 0xe0000000-0xefefffff] for [bus 00-fe]
>> > [    1.463525] pci-host-generic e0000000.pcie: PCI host bridge to bus
>> > 0000:00
>> > [    1.463531] pci_bus 0000:00: root bus resource [bus 00-fe]
>> > [    1.463536] pci_bus 0000:00: root bus resource [io  0x0000-0xffff]
>> > [    1.463541] pci_bus 0000:00: root bus resource [mem
>> > 0xc0000000-0xdfffffff]
>> > [    1.463547] pci_bus 0000:00: root bus resource [mem
>> > 0x800000000-0x8ffffffff]
>> >
>> >
>> > So I assume this works, and I am super grateful. I will test it tomorrow
>> > with our Smart NIC.
>> >
>>
>> This doesn't tell you all that much, to be honest. But at least the
>> numbers look sane now, and appear to match the UEFI configuration.
>
>
> Perhaps I was too optimistic too quickly.
>
> root at localhost:~# lspci
> root at localhost:~# lspci -v
> root at localhost:~# echo 1 > /sys/bus/pci/
> devices/            drivers_probe       slots/
> drivers/            rescan              uevent
> drivers_autoprobe   resource_alignment
> root at localhost:~# echo 1 > /sys/bus/pci/rescan
>
> [  176.977408] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [  176.983100]  1-...0: (1 GPs behind) idle=242/1/4611686018427387904
> softirq=3435/3435 fqs=22
> [  176.991572]  (detected by 2, t=5375 jiffies, g=1169, c=1168, q=60)
>
>>
>>
>> > However, we are building a product that obviously requires long term
>> > maintenance, so may I please get your input on a strategy with this?
>> >
>> > If we decide to stick with this driver, would it be easy for things to
>> > become disjointed?
>> >
>> > The hope with going the EFI route is that we could boot "generic" Ubuntu
>> > and
>> > CentOS installs, so I guess as long as we keep the DT and the EFKII
>> > snapshot
>> > in sync on our side, the risk is low.
>> >
>>
>> I'm afraid you are getting caught in the middle of a philosophical
>> debate here: many engineers that are involved with the Marvell support
>> in Linux feel that a device tree is not something that should be
>> supported long term, and needs to be bundled with the OS. Over the
>> last couple of kernel releases, the Marvell 8040 support was changed
>> in a non-backward compatible manner numerous times.
>>
>> This conflicts badly with the idea that the firmware provides the
>> hardware description (using DT or ACPI), and that the contract with
>> the OS is kept by both sides for longer than a single release.
>>
>> So I cannot really answer that question, unfortunately. If you don't
>> intend to use the onboard network controller, you could go the ACPI
>> route, I guess.
>>
>> Another problem is that none of this UEFI/ACPI support is upstream in
>> the Tianocore project, and trying random trees left and right doesn't
>> really help when assessing whether a platform is suitable as a long
>> term investment.
>>
>>
>>
>> > For example, using the same DT with uboot, it fails:
>> >
>> > [    0.294942] sysfs: cannot create duplicate filename
>> > '/bus/platform/devices/e0000000.pcie'
>> > [    0.294950] CPU: 2 PID: 1 Comm: swapper/0 Not tainted
>> > 4.16.0-rc5-mbcin-netronome-2-dirty #2
>> > [    0.294952] Hardware name: Marvell 8040 MACHIATOBin (DT)
>> > [    0.294955] Call trace:
>> > [    0.294967]  dump_backtrace+0x0/0x150
>> > [    0.294970]  show_stack+0x14/0x20
>> > [    0.294976]  dump_stack+0x98/0xbc
>> > [    0.294980]  sysfs_warn_dup+0x60/0x78
>> > [    0.294983]  sysfs_do_create_link_sd.isra.0+0xd8/0xe0
>> > [    0.294986]  sysfs_create_link+0x20/0x40
>> > [    0.294990]  bus_add_device+0x88/0x148
>> > [    0.294993]  device_add+0x394/0x568
>> > [    0.294997]  of_device_add+0x5c/0x70
>> > [    0.295000]  of_platform_device_create_pdata+0x80/0xd0
>> > [    0.295003]  of_platform_bus_create+0xdc/0x300
>> > [    0.295006]  of_platform_bus_create+0x11c/0x300
>> > [    0.295008]  of_platform_populate+0x4c/0xb0
>> > [    0.295014]  of_platform_default_populate_init+0xa4/0xc0
>> > [    0.295017]  do_one_initcall+0x38/0x120
>> > [    0.295020]  kernel_init_freeable+0x134/0x1d4
>> > [    0.295025]  kernel_init+0x10/0x100
>> > [    0.295028]  ret_from_fork+0x10/0x18
>> >
>> > So I think this confirms that the pcie setup is different between EDKII
>> > and
>> > uboot (unless I am doing something stupid here).
>> >
>>
>> It looks like you have two copies of the pcie node here, no?
>
>



More information about the Macchiato mailing list